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1. ABSTRACT 

Cuckoo hashing [J is a multiple choice hashing scheme in which each item can be placed in 
multiple locations, and collisions are resolved by moving items to their alternative locations. In 
the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous 
disjoint fixed-size buckets. Each item is hashed to two buckets, and may be stored in any of the 
positions within those buckets. Ref. [2] analyzed a variation in which the buckets are contiguous 
and overlap. However, many systems retrieve data from secondary storage in same-size blocks 
called pages. Fetching a page is a relatively expensive process; but once a page is fetched, its 
contents can be accessed orders of magnitude faster. We utilize this property of memory retrieval, 
presenting a variant of cuckoo hashing incorporating the following constraint: each bucket must 
be fully contained in a single page, but buckets are not necessarily contiguous. Empirical results 
show that this modification increases memory utilization and decreases the number of iterations 
required to insert an item. If each item is hashed to two buckets of capacity two, the page size is 
8, and each bucket is fully contained in a single page, the memory utilization equals 89.71% in the 
classical contiguous disjoint bucket variant, 93.78% in the contiguous overlapping bucket variant, 
and increases to 97.46% in our new non-contiguous bucket variant. When the memory utilization is 
92% and we use breadth first search to look for a vacant position, the number of iterations required 
to insert a new item is dramatically reduced from 545 in the contiguous overlapping buckets variant 
to 52 in our new non-contiguous bucket variant. In addition to the empirical results, we present a 
theoretical lower bound on the memory utilization of our variation as a function of the page size. 

2. Introduction 

Cuckoo hashing [J is a multiple choice hashing scheme in which each item can be placed in 
multiple locations, and collisions are resolved by moving items to their alternative locations. This 
hashing scheme resembles the cuckoo's nesting habits: the cuckoo lays its eggs in other birds' 
nests. When the cuckoo chick hatches, it pushes the other eggs out of the nest. Hence the name 
"cuckoo hashing." As Ref. [5] explains, analysis of hashing is similar to the analysis of balls and 
bins. Hashing an item to a memory location corresponds to throwing a ball into a bin. Insights 
from balls and bins processes led to breakthroughs in hashing methods. For example, if we throw 
n balls into n bins independently and uniformly, it is highly probable that the largest bin will 
get (1 -I- o (1)) log (n) /loglog (n) balls. Azar et. al [lOj found that if each ball selects two bins 
independently and uniformly, and is placed in the bin with fewer balls, the final distribution is 
much more uniform. This led to hashing each item to one of two possible buckets, decreasing the 
load on the most-loaded bucket to log(log (n)) -I- 0(1) with high probability. In general, if each item 
is hashed into d > 2 buckets, the maximum load decreases to log(log (n))/log (d) -I- 0(1). 

Cuckoo hashing [4| is an extension of two-way hashing. Each item is hashed to a few possible 
buckets, and existing items may be moved to their alternate buckets in order to free space for a 
new item. There are many variants of cuckoo hashing. The goals of cuckoo hashing are to increase 
memory utilization (the number of items that can be successfully hashed to a given memory size) 
and to decrease insertion complexity. Pagh and Rodler [4j analyzed hashing of each item to d = 2 
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buckets of capacity k — 1, and demonstrated that moving items during inserts results in 50% space 
utilization with high probability. Fotakis et. al. [5] analyzed hashing of each item into more than 
two buckets. Ref. [6| analyzed a practically-important case in which each item is hashed to d = 2 
buckets of capacity k = 2. Refs. [71 [5] found tight memory utilization thresholds for d = 2 buckets 
of any size k > 2. Specifically, they proved that the memory utilization for d = 2 and fc = 2 is 
89.7%. 

Ref. [T] proved that the maximum memory utilization thresholds for d > 3 and k = 1 are equal 
to the previously known thresholds for the random k-XORSAT problem. Ref. [T2l [13] developed a 
tight formula for memory utilization for any d > 3 and k — 1 and Ref. extended the formula 
to any d > 3 and fc > 1. 

comment added. While this work was being completed, we became aware of Ref. [14] which 
proposed a model where the memory is divided into pages and each key has several possible locations 
on a single page as well as additional choices on a second backup page. They provide interesting 
experimental results. 

In a classical implementation of two-way cuckoo hashing, the memory is partitioned into con- 
tiguous disjoint fixed-sized buckets of size k . Each item is hashed to 2 buckets and may be 
stored in any of the 2k locations within those buckets. Ref. |2j analyze a variation in which the 
buckets overlap. For example, if the bucket capacity k is 3, the disjoint bucket memory loca- 
tions are: {0, 1, 2} , {3, 4, 5} , {6, 7, 8} , . . .. whereas the overlapping bucket memory locations are: 
{0,1, 2}, {1,2,3}, {3,4,5},.... 

Their empirical results show that this variation increases memory utilization from 89.7% to 96.5% 
for d = 2 and k — 2. However, many systems retrieve data from secondary storage in same-size 
blocks called pages. Fetching a page is a relatively expensive process, but once a page is fetched, 
its contents can be accessed orders of magnitude more quickly. We utilize this property of memory 
retrieval to present a variant of cuckoo hashing requiring that each bucket be fully contained in a 
single page but buckets are not necessarily contiguous. 

In this paper we compare the following three variants of cuckoo hashing: 

(1) CUCKOO-CHOOSE-K- the algorithm introduced in this paper. The buckets are any k 
cells in a page, not necessarily in contiguous locations. There are (*) buckets in a page, 
where t is the size of the page. 

(2) CUCKOO-OVERLAP f2]- The buckets are contiguous and overlap. Here we assume that 
all buckets are fully contained in a single page, so there are t — k + 1 buckets in a page. This 
is a generalization of Ref. |2]. Originally Ref. [2] did not consider dividing the memory 
into pages. 

(3) CUCKOO-DISJOINT g]- The buckets are contiguous and not overlapping. There are t/k 
buckets in a page. This is a generalization of Ref. [4J. Originally Ref. [1] did not consider 
larger buckets. 

Note that algorithm CUCKOO-DISJOINT is the extreme case of the CUCKOO-OVERLAP and 
CUCKOO-CHOOSE_K algorithms when the size of the page t equals the size of the bucket fc. 

We prove theoretically and present empirical evidence that our CUCKOO-CHOOSE-K mod- 
ification increases memory utilization. Moreover, using the classical cuckoo hashing scheme, an 
item insertion requires multiple look-ups of candidates to displace. Empirical results show that our 
modification dramatically decreases the number of candidate look-ups required to insert an item 
compared to Ref. [2 . In the overlapping buckets variant [,2 , some buckets are split between two 
pages, so that each item resides in up to 2d pages. In our variant, each bucket is fully contained in 
a single page. 

An appealing experimental result is that CUCKOO-CHOOSE-K memory utilization converges 
very quickly as a function of the page size t. When fc = 2 and t = 16, memory utilization is 
0.9763. This value is almost identical to memory utilization when t = 2^° which equals 0.9767. 
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Symbol 


Description 


Comments 


n 


number of vertices 
(hash table capacity) 


n oo 


m 


number of edges 
(hashed items) 


TO < rt, TO — >■ oo 


d 


number of buckets each 
item is hashed to 


typical value: 2, dk > 2 


k 


bucket size 


typical value: 2-3, dk > 2 


t 


page size 


t > k. k divides t. t divides n. 


9 


number of pages 


g = n/t. 




memory utilization 




V 


a set of vertices 


\V\^v 


S = 5(V,E) 


a sub-graph 




V 


v=\V\ 


dk < V < m. 

(ii V > m OT V < dk then S cannot fail). 


X 


x = ^ 

n 


f<x<l3. 


e 


a small constant 


< e <C 1 


5 


a small constant 


< (5 < 1 



Table 1. Parameter names and descriptions 



CUCKOO-OVERLAP memory utilization when t = 16 is 0.9494, and CUCKOO-DISJOINT mem- 
ory utilization is only 0.8970. If we allow a tiny gap of one inside the buckets (t = 3), memory 
utilization increases from 0.9229 (CUCKOO-OVERLAP) to 0.9480 (CUCKOO-CHOOSE-K). Table 
14.21 specifies the parameters used in our analysis. 

3. Theoretical analysis 

We can determine the success of cuckoo hashing by analyzing the cuckoo hyper graph. The 
vertices of the graph are the memory locations. The hyper-edges of the graph connect all the 
memory locations where each item could be placed. Recall that each item can be placed in d 
buckets chosen uniformly and independently of other items. Each bucket is composed of any k 
locations in a page. 

It is well known (see, e.g., ref. [2| for a proof) that a cuckoo hash fails if and only if there is a 
sub-graph S with v vertices and more than v edges. We say that a sub-graph S has failed if it has 
more edges than vertices. 

We will begin by analyzing the probability of success of CUCKOO-CHOOSE-K for the case 
where the page size t equals the array size n. This simple and special case is presented here to 
introduce the main ideas applied in the following section, where we analyze the general case where 
the page size is a finite constant (independent of n) . 

3.1. Memory utilization when the page size t equals the array size n. An analysis of 
memory utilization has been performed previously in [5J. In their analysis, they assume k = 1 and 

prove that if d > ^log(^Y^^: then the hashing will be successful with a probability of at least 

1 — 0(ri"'^^'*). Here we derive a similar constraint on memory utilization /?. We solve the constraint 
numerically for different values of k and d, and obtain a lower bound on possible memory utilization 
for the specified values. We perform the analysis using a modification of the method in which 
we will later generalize to page sizes being equal to any given constant. 

We will bound the failure probability using the union bound. But first, we would like to reduce 
redundant summations. We observe that: 
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• If there exists a sub-graph S {V,E) with \E\ > \V\ and there exists an edge that has exactly 
one vertex vq outside of V, then the sub-graph S' {V — V U {vo} , E') also has more edges 
than vertices since \E'\ >\E\ + l>\V\ + l = \V'\. 

• If there exists a sub-graph S (V, £'), such that \V\ — v and \E\ > u -I- 1 then there exists a 
sub-graph S' {V ,E') such that \E'\ = \V'\ + 1. We can find such a sub-graph simply by 
adding vertices to V one by one until we get a sub-graph where the number of edges equals 
the number of vertices plus one. 

For each sub- graph S we define an indicator variable Zs- 

{1 S {V, E) has \V\=v vertices and \E\^v + l edges AND 
There is no edge that connects ^ to exactly one vertex from outside oiV 
otherwise 

li Zg = 1, then we will say that S* is a bad sub-graph, and otherwise we will say that S" is a good 
sub-graph. If the sum over Zs of all sub-graphs is equal to zero then every sub-graph is good then 
the cuckoo hash succeeded. We will find the the memory utilization /3 such that the sum over Zg of 
all sub-graphs is o(l) as n oo. 

Let phit be the probability that a random edge hits V. Let pi be the probability that a random 
edge connects V to exactly one vertex from outside of V and let pbad (v) be probability that a given 
sub-graph S (V, E) is bad. 

Lemma 1. p.^d (v) - ( (1 - Pi " P«t)""*''+'^ 

Proof. Immediate from the definition of Zs. For S to be bad, exactly v + 1 edges out of m edges 
must hit V and all the rest must miss V and must not connect V to exactly one vertex from outside 
of V. □ 

Lemma 2. The probability that a random edge hits V is phu {v) = 

Proof. Each item is hashed independently to d buckets and the size of each bucket is k. The number 
of buckets in V is therefore (]^) and the total number of buckets is (^) . □ 

Lemma 3. The probability that a random edge connects V to exactly one vertex from outside of V 



Proof, d—1 buckets must all fall in V and one bucket must contain any fc — 1 vertices from V and 
any of the n — v vertices from outside oiV . □ 

Let Pbad {v) be the probability that there exists a sub-graph S with v vertices such that S is bad. 
According to the union bound, Pbad {v) < N (v) ■ pbad {v), where — (") is the number of sub- 
graphs with V vertices. We are going to analyze Pbad (v) as n — > cx). If for all v, Pbad (v) — o (^), 
then J2v=dk Phad W) < o (1) and the cuckoo hash succeeds with high probability. The analysis is 
similar to the analysis given in [5]. Let xo = exp ^ 2 ) ■ divide the analysis into two sections. 

In section [3.1.11 we show that for any memory utilization (3 and < a; < sq, Pbad [x) is o (i). 
In section l3.1.2W e find the maximum memory utilization /? such that Va;o < x < P, Pbad {x) is 
exponentially small. 

3.1.1. Pbad{x) Analysis for ^ < x < Xq. In this section we show that if dfc > 2 then for any 
memory utilization f3 and < a; < xq, Pbad {x) is o (i). 

Lemma 4. Pbad {x) < cq (x) ■ (x), 

where Cg (a;) = e • x'^''~^ and c\ [x) — e^^ ■ x^'^'^^^^^ . 

Proof. See appendix lAl □ 
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Theorem 5. Let 5 be a small constant, < 5 <S^1. If dk > 3 then for any load (3: 
1-Iff<x<f + S, then Pkad {x) < 0(n-(('^'=)'-'^'=-i)) = o (i) . 
2. If^+S<x<Xo — S, then Pbad {x) decreases exponentially as n oo. 

Proof. For any memory utilization /3, Pbad (x) < Cq (x) ■ c" (x) , 
where cq (x) = e ■ x'^'^"^ and ci (x) — e^^ • x^'^''~'^^^ . 

Note that cq (x) and ci (x) are independent of j3. Recah that xq = cxp ■ + (5 < a; < 

xo — (5, there exist a constant e such that ci (x) < 1 — e . We obtain that: 

1. If X ^ then Pbad (x) < Um^^^co (x) • (x) - O < O (n'S) = o (i) . 

2. If ^ + ^ < X < xo — (5, then ci (x) < 1 — e, and Pbad (a;) < cq (x) • c" (x) decreases exponentially 
as n — ?• 00. □ 

3.1.2. Pbad (x) Analysis for xq < x < f5. In this section we are going to find the maximum mem- 
ory utilization /? such that Vxq < x < P, the probability that there exists a bad sub-graph is 
exponentially small. 

Lemma 6. Vxq <x<f3, Pbad {x) < 0(1) • Cg (x,/3) where 

C5 (x, P) - (jh) (i)^ (Z^) (If-''-' (1 - '^Ml - x^'^-^ - x-)'^-^ 

Proof. See appendix IB] □ 

Theorem 7. // C5 (x, /3) < 1 — e /or xq < x < /3, t/ien Pbad {x) decreases exponentially as 
n — > 00. Any memory utilization j3 that satisfies the constraint is a lower hound on the possible 
memory utilization. 

Proof. The theorem follows directly from the inequality Pbad [x) < 0(1) • C5 (x,/3). □ 

Numerical solutions to the constraint C5 (x, /3) < 1 indicate that the memory utilization of the 
CUCKOO-K algorithm is l3choose-2{k ^ 2,d ^ 2,t = n) > 0.937 and fichoose-z{k = 3,d = 
2,t — n) > 0.993. Our empirical results show that Pchoose-2{k = 2,d — 2,t ~ n) = 0.9768 
and Pchoose-sik = i,d = 2,t = n) = 0.9974. The memory utilization for fed > 6 rapidly ap- 
proaches one. Theoretical analysis performed by [8l [7] provided tight thresholds of the memory 
utilization of the CUCKOO-DISJOINT algorithm. ^Disjoint {k = 2,d = 2,t = n) = 0.8970 and 
^Disjoint {k = 3, d = 2,t = n) = 0.9592. Ref. ^ do not provide a theoretical memory utilization 
threshold for smah k. The empirical results of Ref. [2] show that in the CUCKOO-OVERLAP 
algorithm fioveriap{k = 2,d = 2,t — n) = 0.9650 and (3overiap{k — 3,d = 2,t = n) = 0.9945. The 
theoretical analysis of the CUCKOO-DISJOINT algorithm performed in |12[ [TJ [T3] does not apply 
for fc > 1 and the theoretical analysis of the CUCKOO-DISJOINT algorithm performed by [llj 
does not apply for d < 3. 

3.2. Memory utiUzation when the page size t is a given constant. In this section we 
analyze the probability that the hashing fails for the case where the page size t equals a constant. 
Let Pfaii (v) be the probability that there exists a sub-graph S with v vertices such that 5* has more 
edges than vertices and every vertex is in at least one edge. 

Let xi = exp ^ ) • Here again we divide the analysis into two sections. In section [3.2. II 

we show that for any memory utilization /3 and < x < xi, Pfaii (x) is o (i). In section 13.2.21 
we find the maximum memory utilization /3 such that Vxi < x < /?, Pbad (x) is exponentially small. 

3.2.1. Pfaii (x) Analysis for — < x < xi. In this section we show that for any memory utilization 
/3 and V^^ < X < xi, (2:)" is o (i). 

The analysis here is similar to the case above where t = n, however now we need to take into 
consideration the distribution of the vertices over the pages. 
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Let y be a given set of vertices, and let Vi be the number of vertices in page i. The probability 
that a random edge hits V is equal to pMt {V, t) = < ELi (f ) 

\ Yl=i (f-) ) be an upper bound on phit {V, t). 
We are going to use the following lemma which states that increasing the page size reduces phu ■ 
Lemma 8. For any set of vertices V and any integer c,phit {V^ t) > phit {V, ct). 
Proof. Since the function / (v) ~ (f is convex, for any sequence of c pages, i (^(■^)'' + • • ■ 



( ^^"'"c't"'"^'' )^' thus multiplying the page size by a factor c does not increase phit- For convenience, 
we restrict our analysis to pages of size t = k ■ c, where c is any integer. The worst case is obtained 
when t = k which is equivalent to the classical CUCKOO-DISJOINT hashing. 

We will now analyze Pfaii (x) and Pbad {x) as n — >■ oo. Recall that P = ^ is the memory 
utilization, < /3 < 1 and x — —. □ 

Lemma 9. Pfau (x) < cg (x) • d} (x) 

wriere cg (x) =^ ex and cy (a;) = e a; 

Proof. See appendix [Cj □ 

Theorem 10. Let S be a small constant, < S <^1. If {d — 1) dfc > 3, then for any load /3: 
1-Iff<x<f + S, then Pfau (x) < = o (i) . 

2. If^+S<x<Xi— S, then Pfaii (x) decreases exponentially as n ^ oo. 

Proof. For any memory utilization /3, Pfaii (x) < cq {x) ■ c" (x) 
where cg yx) = ex and C7 [x) ~ e ^ x 

Note that cg (x) and cy (x) are independent of /3. Recall that Xi = exp (^^j^zj^:^^ '^v + — 
X < xi — S, there exist a constant e such that cj (x) < 1 — e . We obtain that: 

1. If X ^ ^, then Pfaii (x) < lim^^^cg (x) • (x) = O (^-(('i-i)''^-!)) < O (n-^) o (i) . 

2. If ^ + (5 < X < xi — (5, then cy (x) < 1 — e, and Pfau (x) < cg {x)-Cj (x) decreases exponentially 
as n — !> cx). □ 

3.2.2. Pbad (x) Analysis for xi < x < /3. In this section we find the maximum memory utilization 
/3 such that Vxi < x < /3, the probability that there exists a bad sub-graph is exponentially small. 
We examine the set of sub-graphs that have a given distribution a of vertices over the pages, 
a = (oo, at), where Oi is the number of pages that have i vertices. For example, when the page 
size t was equal to n and the number of pages g was 1, the number of sub-graphs with v vertices 
was (") and the corresponding a of those sub-graphs was {0, 0, = 1, 0, 0}. by definition of 
a. 



> 



t 

(3.1) ^ai=g 

i=0 

Lemma 11. When the page size t is a given constant, the number of different possible values of a 
is polynomial in n. 

Proof. We denote by #a The number of different possible values of a. 

#a < g* = ( j) . Since t is constant, ff^a is polynomial in n. □ 

Let Pbad (a) be the probability that there exists a bad sub-graph with distribution a of vertices 
over the pages. If Pbad (a) is exponentially small for every a, then the union bound over a polynomial 
number of all possible values of a is also exponentially small. Let pbad (a) be the probability that a 
given sub-graph S (V, E) with a = (oo, at) is bad. 
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Let a — = he a, unit vector. When the page size t is a constant, the probabihty that a random 



edge hits V is: 



(3.2) phit (a) 



. 9(1) J \ Q J 
and the probabihty that a random edge connects V to exactly one vertex from outside of V is 




9Q A 9Q 

Using the union bound we obtain that 

(3.4) Pbad (a) < ^ (a) • Pbad (a) 
where N (a) is the number of sub-graphs with a ~ (ao, Of ). 

(3.5) iV(a)=f ^ )f[ 
For the asymptotic behavior of N (a), we are going to use the fohowing lemma: 
Lemma 12. U,.^. J < (^) " 

Proof. See Appendix (D) □ 
Lemma 13. Pbad (a, P) < 0(1) • Cg (a, /3) • Cg (a, /3) w/iere 
C8 (a, /3) = phit (1 - _pi - PhitV^ '^"'^ 

Proof. See appendix [E] □ 

Theorem 14. //cg {a, /3) < 1 — e for all\f xi < x < j3, then Pbad (a,/3) decreases exponentially as 
n oo . Any memory utilization /3 that satisfies the constraint is a lower bound on the possible 
memory utilization. 

Proof. The theorem follows directly from the inequality Pbad (a, /3) < 0(1) • cg (a, /3) ■ Cg (a, (3). □ 

Theoretical lower bounds of the memory utilization obtained from numerical solutions of the 
constraint cg (a,/3) < 1 are displayed in figure |4T1 



4. Empirical Results 

The experiments were conducted with a similar protocol to the one described in [2J. In all 
experiments the number of the buckets, d, was two. The capacity of each bucket k was either two 
or three. The size of the hash tables n was 1, 209, 600. The reported memory utilization (3 is the 
mean memory utilization over twenty trials. The random hash functions were based on the Matlab 
"rand" function with the twister method. Items were inserted into the hash table one-by-one until an 
item could not be inserted. The results of both CUCKOO-CHOOSE-K and CUCKOO-OVERLAP 
were notably stable. In each case, the standard deviation was a few hundredths of a percent, so 
error bars would be invisible in the figure. Such strongly predictable behavior is appealing from 
a practical standpoint. Since we added a paging constraint, our results are not comparable to 
previous works that do not include a paging constraint. 
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Figure 4.1. 

Memory utilization vs. page size. Empirical CUCKOO-CHOOSE-K (green), Empirical 
CUCKOO-OVERLAP (red), approximation formula of Empirical CUCKOO-CHOOSE-K (blue), 
and theoretical lower bound of CUCKOO-CHOOSE-K (black), left: k ^ 2, right: fc = 3. 




Figure 4.2. 

Number of lookups required to insert an item vs. memory utilization (left) and vs. page size 
(right). In the left figure the page size is 8. In the right figure the memory utilization is 92%. The 
left figure was smoothed with an averaging filter. The variance in the number of lookups required 
to insert an item was much smallar in CUCKOO-CHOOSE-K. 

Experiments show that CUCKOO-CHOOSE-K improves memory utilization significantly even 
for a small page size t and a small bucket capacity k when compared to the classical cuckoo hash- 
ing CUCKOO-DISJOINT. It outperforms CUCKOO-OVERLAP as weU. RecaU that CUCKOO- 
DISJOINT is the extreme case of CUCKOO-CHOOSE-K and CUCKOO-OVERLAP when the 
page size is equal to bucket size fc. The memory utilization Pchoose~k converges very quickly to 
its maximum value. For example f3choose-2(t = 16) = 0.9763 is almost equal to Pchoose~2{t = 
1,209,600) = 0.9767. Whereas f3oveTiap-2 {t = 16) = 0.9494 and PDtsjo^nt-2 = 0.8970. 



A CUCKOO HASHING VARIANT WITH IMPROVED MEMORY UTILIZATION AND INSERTION TIME 9 



The empirical memory utilization can be approximated very accurately by the following formulas: 

(4.1) {t) « 0.977 • {l (^)-'-'"' 

(4.2) Pchoose-:^ {t) « 0.997 • (^1 - (y^t)"' ''' 

The maximum approximation error is 0.0011 for k ~ 2 and 0.0015 for fc = 3. 

The empirical results and their approximations are displayed in figure 23] together with empirical 
results of CUCKOO-OVERLAP. The memory utilization for kd > 6 rapidly approaches one (not 
displayed) . 

CUCKOO-CHOOSE-K outperforms CUCKOO-OVERLAP not only in memory utihzation, but 
in the number of iterations required to insert a new item as well. Figure W7]] illustrates the number 
of iterations required to insert a new item when the hash table is 92% full and we use breadth 
first search to search for a vacant position. ^itrchoose-2{t = 8) =52, whereas 4^itroveriap-2{t = 
8) = 545. Note that in these simulations we did not limit the number of insert iterations and we 
continued to insert items as long as we could find free locations. Most applications limit the number 
of inserted iterations, and maintain a low memory utilization in order to find a free location easily. 
If an empty position is not found within a fixed number of iterations, a rehash is performed or the 
item is placed outside of the cuckoo array. 

Appendix A. Proof of Lemma [4] 
Here we prove that Pbad (x) < cq (x) ■ (ci (x))", where cq (x) — e-x'^''^^ and ci (x) — e^^ .j.{dk~2)x _ 

Proof. Recall that /3 = is the memory utilization, < /? < 1 and x — ^ < x < (3. {il v>m 
OT V < dk, then the sub-graph S cannot fail). 
Pbad {v) < (") ■ Pbad {v) where 

Pbad (v) = ( (1 - Pi - PH^r-^^"^''' , 



Phit (w) = \J^j ' ^'^^ 

("), Pbad, (J^J and pMt are bounded by: 

<-) (:)<(^)'-(r- 

(A^2) ».i(.) = (/"j)pE'(i-Pi-m..r-'""< („"\)pS 

/ \ / \ V-i-1 I T / n\ xn-\-l 



ii+i 

it ■ 



(A.4) p^,^t < (- 

And we get that 



dk 



(A.5) Pbad{v)< (^ypbad{v)<(^^y" (^^y"^\x'"'y^^' =C0{X)-C'^ix). 
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□ 



Appendix B. Proof Of Lemma [6] 
Here we prove Vxq < x < /3, Pbad {x, P) < 0(1) • Cg {x, jS) where 
C5 = (t^) {Mf (^) {l)^-"'^ (1 -dk{l- X) - .-)^-^ 
Proof. Recall that: 

(B.l) {v) <N{v)- v,aa (v) - Q ^ (1 - - P,,0™"^''+'' • 

(") , Phit and pi are bounded by: 

n\ I n \ ni\^ / 1 \ 1 



(B.2) ( . < 



V \n — V \v J \ 1 — X 



(B.3) [x < phit < X 



n 



„dk 



(B.4) dk-^ -[x~-] <pi. 

X \ n J 

Smce a; > xq ^ -^j ^, we neglect the term i <^ x in the following approximation of (y^-^, and 
we neglect the term <C a; in the lower bounds for phu and pi. As n — )> oo, these terms contribute 
to Phad [v) factors which are bounded by 0(1). 

,B.5) ( 1 < r .V"'"" i^r < 0(1) f " 



u + ly \m— (w + l)y \v + \ J \/3 — X J \x 

(B.6) (1 - PI - p..,)'"^^"^^^ < 0(1) -{l-dkil- x) x'^'-' - 

The proof for the left inequalities is given in and is also a special case of the more general 
inequality we prove later in lemma [T^ 
We get that: 

(B.7) Pbad{v)< f"] ■pbad{v)<0{l)-C^{x,l3)-cl{x,P), 



where 



(B.8) C4 {x, ^) = (/3 - x) x^''-^ (1 - dfc (1 - x) x'^^-^ - x'^ 

and 



1 \ (1-2:) / o \ (/3-a:) / o \ a; 

1 I ^ \ I P \ I P\ dkxl-. Jilt-, _x dfc-1 ^dk\P-^ 



(B.9) c. /^) - j U j I A j U J - - - 

Since C4 < O (1) , we get: Pbad (x) < 0(1) • (x, /3). □ 
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Appendix C. Proof of Lemma [H] 

Here we prove Pfaii (v) < Cg (x) ■ c" (x) 

wnere cg [x) — ex and cy [x) — e ^ x ^ 

Proof. According to lemma [51 for any set of vertices V, Phu decreases when the page size t is 
multiphed by an integer. For simpUcity, we restrict our analysis to pages of size t = k ■ c, where c is 
any integer. The worst case therefore is when t = k which is equivalent to the classical CUCKOO- 
DISJOINT hashing. We use the union bound to obtain Pfaii {v) < N {v) -pfaii (v), where N {v) is 
the number of sub-graphs with v vertices, where each vertex of the sub-graph is hit by at least one 
edge, and pfaii (v) is the probability that a given sub-graph S {V, E) was hit by more than v edges. 
By definition of p fail '■ 

\~i)+i 



when t = k we get: 

(C.2) Kit' = {^T^" - {-'^ . 

and 

(C.3) „,„)<('"/f)<^"/^""_^rrV.V' 



yvjk) \ vjk ) \\x) 

Since when the page size t equals /c, an element can be placed either in all of the locations of a page 
or in none of them. 

<-) (;:0<C:0<(^)'"<(vr^ ©(©')■' 

We get that: 



(C.5) 



Pfa^l (V)<N{V)- pfa^l (v) < N (v) (^^ ^ J pl+' < C, {x) ■ C? (x) , 



where cg {x) = ex'^ ^ and c^ {x) — e k x ^ . □ 

Appendix D. Proof of Lemma [12] 
The proof of (^^ ^ < 0^=0 (^) ^ generalization of the proof given in [5J. For any positive 
integer i and any non-negative integer g: (2/0 + ■•• + y*)^ E (ao,.^.,c<J OLo (yf')' where 

ao + ...+at=g 

the summation is taken over all sequences of non-negative integer indices ao through at such that the 
sum of all Ui is g. For the special case where ?/i = is a non-negative integer and Ei=o '^i — 9 

we get: 

Vao, at) V 9 J V"0' to V 9 J 

So 

(D.2) ( ' )^ t ) x-^ =n(- 
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Appendix E. Proof of Lemma 
Here we prove 

Pbad (a, /3) < 0(1) • cg (a, /3) • (a, /3) where 
cs (a, /3) = p/iit (1 -pi -phit)~^ (^^) ^^"^"^ 

Proof. Recall that a is a unit vector and v and x are equal to the following functions of a: 

t 

(E.l) E^* = i 

i=0 

t t 

(E.2) = • i = ^^a, • i = 



i t t 

i=0 i=0 1=0 



1 * 
V 1 



(E.3) X = - = - ^ hi ■ i. 

71 f ^ ^ 



n t 

i=0 



Pbad (v) < N (v) ■ Pbad (v), where 



(E.4) iv(.)=( -9 )n^' 

\ao,...,atJ fj-^ 



(E.5) 



7n— (v-\-l) 



(E.6) p/iit 

(E.7) pi^d 
Using lemma [12] we get: 



Q [ (D 



\ t /,\ O; / t / 1 \ -r * /j- 



d-1 



and 



Finally we get: 

(E.IO) Pbad (v) <Niv)- Pbad (v) = 0(1) • cs (x, /3) • (a, /3) 

where 
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(E.ll) 
and 



C8 (a, /3) = phit (1 - pi - PhitY 



(E.12) 



c.(a,/3)=n - n 



i=0 



i=0 



/3-x 



iP-x) 



Pm (1 - Pi - PMt) 



□ 
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